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Abstractor JP 6019864 (A) 
PURPOSE: To provide a mechanism for effectively 
providing the purpose of grouping by the dynamic 
independence of a picket. CONSTITUTION: This 
array processor is provided with a structure for 
assigning pickets to groups 1 -8 as array functions in 
which array processors are operated in parallel by 
all the active processing elements of an array, and a 
mechanism for using grouping for selecting a certain 
picket for an arithmetic operation unique to the 
problem of the groping. The position of a memory in 
each picket is assigned to each group in which a bit 
is set or reset for indicating participation in the 
groups 1-8, and the position of the memory forms 
one part of the processing elements which 
themselves can be copied. In the system using a 
shared memory, one part of the memory is assigned >a^s 
as a local memory, and also as a large area 
memory, and then a local relevant memory part is 
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CLAIMS 



(57) [Claim(s)] 

[Claim l]Are an array processor characterized by comprising the following, and each of 
said processing element, A picket containing an interconnection means with an arithmetic 
operation element, two or more registers, a local memory, and other processing elements 
in said array processor is comprised. In order that interconnection of each picket may be 
carried out working [ said array processor ] and each picket may process data individually 
within a picket's group (the following "picket group" is called). It has a means for 
assigning the picket itself [ concerned ] dynamically to one or more picket groups. All the 
pickets in said array processor for every clock cycle. Further, receive instructions from 
said array controller, perform the instructions concerned, and said local memory of each 
picket has a memory location of 1 corresponding to a picket group's each, and it 
according to a state of the memory location concerned. An array processor which 
indicated whether the picket concerned participates in a picket group corresponding to the 
memory location concerned. 
An array controller. 

Two or more processing elements which operate by SIMD operational mode. 

[Claim 2]The array processor according to claim 1 which each picket has a douse mode 
means for saving an internal state of the picket concerned, and a douse latch, and was 
changed according to a result of calculation of the contents of the douse latch concerned 
in the picket concerned. 

[Claim 3]While forbidding storing in said local memory in the picket concerned so that 
only a result of calculation which influences a condition of the picket concerned may be 
loaded to said douse latch since each picket's internal state is saved. The array processor 
according to claim 2 which permitted continuing other operations including reading and 
arithmetical computation of said local memory in the picket concerned. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Industrial Application]This invention relates to the mechanism for carrying out grouping 
of many pickets in a SIMD/MIMD array processor, in order to execute a program by an 
array processor, if an array processor including two or more pickets is started and also it 
explains in detail. 

[0002]The parallel associative processor system of the place which comprises 100 thru/or 
1000 processing elements is described by related patent application U.S. Serial Number 
6 11 594 (it corresponds to Tokuganhei3-278900). However, to this patent application, the 
improvement point of this invention including a douse latch is not indicated at all. 
[0003] Terminological explanation and ALU ALU are the arithmetic operation parts of 
each processing element. 

[0004]- An array array is the arrangement of the element in one dimension or the 
dimension beyond it. If it carries out from a viewpoint of the hardware of a super parallel 
computer, usually it is a set of the structure (processing element) which has the same 
composition as an array. When performing parallel operation (operation) of data, each 
processing element can perform these operations respectively independently and in 
parallel if needed, when operation is assigned to each. Generally, an array can be 
considered as a lattice of a processing element. 

[0005]- An array controller array controller is a unit programmed as a controller for array 
processors. An array controller performs the function of the master controller for carrying 
out grouping of the processing element arranged in an array processor. 
[0006]- There are a multi-instruction multi data stream (MIMD) and a single instruction 
multi data stream (SIMD) in the two main architecture of an array processor array 
processor. In an MIMD array processor, each of that processing element performs the 
unique instruction stream of itself about the data of itself. On the other hand, although 
each of that processing element is restricted to the same command from a common 
instruction stream in the SIMD array processor, the data relevant to each processing 
element is unique. The suitable array processor of this invention is called an advanced 
parallel array processor (Advanced ParallelArray Processor:APAP), and has other 
features. 

[0007]- A controller controller is a device which orders it transmission of data and a 

command via the link of an interconnection network. The operation is controlled by the 
program which is controlled by the program executed by the processor to which the 
controller concerned is connected, or is executed within the controller concerned. 
[0008]- A link link is a physical or logical element. A physical link is a physical 
connection for combining two or more processing elements. 
[0009]- In order to perform two or more data streams which MIMDMIMD is the 
architecture of an array processor and are positioned one [ at a time ] for every processing 
element, The architecture of the place which each processing element in an array 
processor has an instruction stream of itself, therefore has a multiple instruction stream as 
a whole is meant. 

[0010]- A module module is a fiinctional unit of the hardware designed so that it might be 
individualized and might be used with an identifiable program unit or other components. 



A set of the processing element contained in a single electronic chip is also called a 
module. 

[00 11]- A node is a node of many links at a general node. In the array of a common 
processing element (PE), one PE becomes one node. Each node may include a set of PE 
called a module. It is preferred to form each node from eight processor memory elements 
(PME) in a typical example. 

[0012]- Since a set of the module which comprises the node array PME is called a node 
array, it is an array of the node which comprises two or more modules. There are usually 
more node arrays of 1 than some PME(s). 

[0013]- A picket picket is a component of an array processor. A picket is equivalent to 
1/n of the array processor of 1, and takes the form of PME. The processor logic designed 
according to the PME chip of this invention can have the logic for array processors which 
realized picket logic described by the patent application to which the above-shown 
relates, or was formed as a node of 1 . the term of a picket is similar with the term 
"processing element (PE)" regularly used in the field of the array processor. As for the 
processing element in the array processor equivalent to a picket, it is preferred to 
constitute from the processor elements and the local memory which were together put so 
that two or more information bytes might be processed to a bit parallel between one clock 
cycles. The picket in a typical example comprises a means for carrying out 
interconnection of the picket concerned to the data flow processor (ALU+ register) of 1 
byte width, the local memory which has the capacity of 32 K bytes or more, and primitive 
control with other pickets. 

[0014]- With a picket processor picket processor. It is the total system provided with a 
picket's array, the interconnection network, the I/O system, and the SIMD controller that 
comprises the micro controller which operates a microprocessor, a storing routine 
processor, and the array concerned. 

[0015]- The picket architecture picket architecture is a desirable embodiment of the 
SIMD architecture of the place which has the feature which is adapted for the problem of 
some various kinds. These problems contain the following. 

- A set associative processing-parallel numerical processing-picture and the term of 
processing and PMEPME of a similar physical array are used as what means a "processor 
memory element." PME of 1 involutes the picket of one. PME of 1 is 1/n of the array 
processor of one, and comprises the processor elements and the related memory element 
of 1, a control interface, and a part of interconnection network. Like [ in the case of a 
picket processor ], PME can have connectivity (connectivity) with a regular array, or can 
have a part of connectivity of a sub array like [ in the case of the node of PME ]. 
[0016]- Routing routing is assigning a physical course until the message of 1 reaches the 
address. Data resources (sending agency) and an address relate to assignment of routing. 
These elements or addresses have a temporary relation or similarity. Routing of a 
message is due to the key of the place obtained by referring to a quota table in many 
cases. The addresses in a network are arbitrary processing elements in which an address 
is possible, and an address is carried out as an address of the information transmitted by 
the path control address which identifies the link. The destination field of a message 
header identify an applicable address. 

[0017]- As [ order / from a single instruction stream / so that SIMDSIMD may be the 
architecture of an array processor and every one processing elements / all / of those per 



processing element may perform two or more data streams currently assigned ] 
[0018]- SIMD/MIMDSIMD/MIMD is a term showing the computer which has a double 
function which can change from MIMD to SIMD, therefore has double operational mode 
between a certain periods, in order to process some complicated commands. When the 
super parallel computer "connection machine 2 (CM-2)" of a sinking machine company 
is arranged as the input edge or outgoing end of an MIMD computer, the programmer can 
operate double operational mode, in order to perform the portion from which the problem 
of 1 differs. The bus which carries out interconnection of the master control processor to 
other processors is used for these computers. This master control processor has the 
capability to interrupt processing of other processors. Other processors can execute an 
independent program code. The means for performing a checkpoint (the present state of a 
controlled object processor is closed and saved) must be provided during interruption. 
[0019]- As [ order / from a single instruction stream / so that SIMIMDSIMIMD may be 
the architecture of an array processor and every one processing elements / all / of those 
per processing element may perform two or more data streams positioned ] At this 
architecture, the data subordinate operation in each picket of the place imitating 
execution of a command is controlled by a SIMD instruction stream. A SIMIMD 
computer is a single instruction-stream computer provided with the capability for setting 
a multiple instruction stream in order using a SIMD instruction stream (every one per 
picket), and processing two or more data streams (every one per picket). SIMIMD can be 
performed by a PME system. 

[0020]- The synchronous operation in a synchronous operation MIMD computer is the 
operational mode that each action is connected with the phenomenon of 1. Although it is 
common that it is a clock as for this phenomenon, a specific phenomenon which is 
regularly produced within the program sequence of 1 may be sufficient as it. If operation 
of 1 is dispatched to two or more PE, such PE will advance so that the function may be 
performed independently. Control is not returned to a controller until this operation is 
completed. After the demand to many processing elements in the array concerned is 
generated by the controller to the array of a processing element if [ demand ], this 
processing element must complete each operation, before control is returned to a 
controller. 
[0021] 

[Description of the Prior Art]In research without the end of a quicker computer, engineers 
have come to build a super parallel computer by linking the microprocessor of the low 
cost of hundreds - 1000 numbers in parallel, in order to conquer the complicated problem 
on which today's computer is perplexed. This invention relates to a new technique for 
building a super parallel computer. Many improvement points of this invention should be 
considered by making conventional technology into a background. In order to have 
chosen the architecture which conforms to specific application most, the trade-off on a 
system was needed, but there was no solution which can be satisfied until now. There is a 
target of this invention in offering solution still easier. That is, this invention relates to the 
array processor which can realize SIMD and MIMD operation. 

[0022]Hereafter, various United States patents related to a SIMD computer are outlined. 
In order to execute a program with the mechanism according to this invention, i.e., an 
array processor, the mechanism for carrying out grouping of many pickets in SIMD and 
an MIMD array is not indicated or suggested to these United States patents. 



[0023]US,4783738,B is turned to many sides of autonomy (autonomy). 

If a SIMD controller publishes instructions of 1 to all the processing elements (PE) in an 

array processor, as a result of that each PE is spatial or the data subordinate characteristic, 

it can change the bit in these instructions, or, specifically, can insert a bit into these 

instructions. 

Generation of ADD/SUB, SEND/RECEIVE, and OPA/OPB is included as an example. 
This function is used for carrying out Image Processing Division of two or more lines, 
and demarcating the boundary of a picture. Although this United States patent is similar 
to one of two or more autonomous functions of the place which can be used in relation to 
the grouping of this invention (ALU is made to perform data subordinate operation in this 
invention), this invention does not have intention of change or insertion of a bit of 
operation like this United States patent. This invention does not take into consideration 
that an ALU function is a specific function of the space position in an array processor. 
Although it is similar with a certain thing of the mechanisms of this invention in that this 
United States patent starts data subfunction. The data reference matters (a mark or 
condition code) of the instruction sequence of 1 only force ALU, and he is trying for this 
invention to make some things of other perform by realizing a DWIM (Do What I Mean: 
perform that you wants) function. 

[0024]US,4736291,B has described the array conversion processor of the place which 
performs high speed processing of a data array. This United States patent is optimized so 
that the FFT (Fast Fourier Transform) algorithm in the field of earthquake analysis may 
be performed. The center of this United States patent is a system control bus shared by 
the bulk memory and a maximum of 15 devices. Each device has a control storage which 
can be written in, program memory, a control unit, and a device subordinate unit that 
provides the characteristic respectively peculiar to each of 15 devices. On the other hand, 
itself is not necessarily a parallel array processor, and this array conversion processor is 
not necessarily mentioned about this with this United States patent. Array conversion can 
be performed also by the system of this invention. However, the processor described by 
this United States patent is a complicated and repetitive order, and is provided with some 
subunits (stage) which process a data array. While two or more processing elements take 
out two or more elements of a data array, respectively, he is trying to process data in 
parallel in these processing elements by the SIMD array processor of this invention in 
contrast with this United States patent. 

[0025]US,4831519,B so that each may connect two or more processing elements (PE) of 
each other which have 16 bit width and it can be effectively adapted in various data 
formats. The SIMD array processor provided with interprocessor connection which is 
prolonged on the left-hand side and right-hand side from each PE is described. For 
example, when processing a 64-bit floating point word by four PE, among such PE by 
one PE (higher rank). The exponent part (16 bits) of this floating point word is processed, 
and the decimal fraction (48 bits) of this floating point word is processed by the three 
remaining PE connected mutually. Control of a carry /loan is mutually combinable 
between PE, in order to attain this. 16 PE for data processing, two PE for address 
generations, and two spare PE can be carried in one chip. I/O of this United States patent 
makes it possible to generate one of four kinds of voltage depending on the logic 
conditions of these two lines while it combines two logic signals to one pin by using the 
signal system of four levels. Although the device of this United States patent must 



provide a controllership function, this is hardly described. In order to act on the data of 
various sizes, the global area (global) MASK which defines whether what we do with the 
grouping of two or more PE, and the partial NEST control spread from master PE to 
slave PE of that right-hand side are supplied to the array processor of this United States 
patent. However, determining to which group PE of 1 belongs is not described at all by 
this United States patent. The local autonomy in the meaning that arbitrary processings 
are performed based on the data under not NEST control but processing is not described 
at all, either. By combining two or more PE which adjoins mutually, this United States 
patent has described the possible design for a horizontally extensible SIMD chip so that it 
may operate as a single processor for processing the data of 16, 32, and 48 bit width. 
However, the point how two or more processing elements are combinable in this United 
States patent, Two or more PE which the place which operates as a single processor 
adjoins mutually. About the point how the data of 16, 32, or 48 bit width can be 
processed, and the point how to know whether the processing element of 1 should 
participate in predetermined processing, or to determine, it is not described or suggested 
at all. This United States patent has the grouping ordered by the global-area control 
MASK. However, local autonomy is lacking in this United States patent. After 
understanding this invention, this will become clear if this United States patent is re- 
evaluated. 

[0026]US,4783782,B has described the test at the time of the manufacture of a SIMD 
chip for separating a maximum of two defective PE in a SIMD array described by 
US,483 15 19,B shown above. Error data is stored in the PROM section on this chip. After 
reading this error data by a controller, the resources on this chip can be assigned 
dynamically. Thus, the chip of this United States patent only has the autonomy to which 
the processor was restricted. 

[0027]US,4748585,B has described the mechanism for assigning the various elements of 
a concurrent processor to many segments so that it may be adapted for the data of various 
length. Although this United States patent is also similar to US,4831519,B shown above, 
this United States patent requires it for combining two or more uniprocessors mutually 
and carrying out grouping of this, in order to process a word larger than the width of each 
uniprocessor. Each uniprocessor is completed in that it has a microsequencer, ALU, a 
register, etc. The feature of this United States patent has some uniprocessors in the point 
that it can operate in the mode which is not mutually rigid, in order to process a wide data 
word. Control of segmentation is performed by the global-area control which used a 
combination code and global-area condition code. As an MIMD array, this device cannot 
provide capability of a SIMD array. On the other hand, this invention relates to 
connecting in the mode which had two or more pickets improved rather than relates to the 
control for the MIMD arrays for combining two or more uniprocessors mutually like this 
United States patent, and building a still wider processor. 

[0028]US,4825359,B has described the processor for performing processing of a data 
array, for example, the Fast Fourier Transform, (FFT). This processor contains some 

processing operators (operator). 

The each is programmable to perform one step during calculation. 
This processor can be classified as a complicated uniprocessor provided with some 
processing arithmetic children of the place which operates as a pipeline when performing 
many complicated processes. Neither grouping nor autonomy is described at all by this 



United States patent. This United States patent only has intention of performing a certain 
improvement so that it may be adapted for a wide range operator. 
[0029]US,4905143,B has described the array processor for calculating the recursive 
function which has local data subordinacy using these calculation resuhs while 
performing calculation of all the combination of the variable of two molds. These 
calculation results are characterized by matching calculation on the basis of the theory of 
dynamic time warping or dynamic programming of the place which is used in the case of 
pattern matching in the field of speech recognition. It has intention of this processor so 
that it may function as one sort of systolic (systolic)MIMD tables. Two or more PE is 
arranged at ring shape, and passes an interim result to the next PE in a ring. Each PE has 
the instruction memory and other means of itself This United States patent has not 
described carrying out grouping of two or more PE at all except for arranging physically 
not having not only described the autonomy of PE in a SIMD array but two or more PE to 
ring shape, either. 

[0030]US,4910665,B explains the two-dimensional interconnection network of a SIMD 
array processor so that each PE can access directly eight elements which adjoin this. The 
communication media are the dot connected networks of the place which carries out 
interconnection of the four adjoining elements by each comer. Although the SIMD 
computer is indicated also with this United States patent, the point whether the local 
autonomy or grouping of PE like this invention can be provided or it should provide is 
not described at all. 

[0031]It is completely unrelated to realization of the mechanism as this invention in 
which US,49253 1 1,B is the same. In this United States patent, two or more processors in 
a multiple processor system can be assigned as a group that one problem should be 
solved. Each processor can add itself to a group, or it not only can deliver and receive a 
message, and a semaphore and other control with other processors, but can remove itself 
from a group. This United States patent has described nothing on the character as that 
MIMD about giving local autonomy in PE in a SIMD array like this invention. Instead, 
each processor in this multiple processor system has a network interface controller of the 
place containing RAM, a microprocessor, and a certain functional unit (disc controller). 
Grouping is controlled by the network interface of each processor. This invention does 
not need such precise task division. 

[0032]US,4943912,B has described the MIMD array processor connected to the NEWS 
network. That is, after loading a program to the memory of each PE in an array processor, 
an array controller publishes the procedure starting command of 1, in order to identify 
two or more PE which should start execution. Each PE contains the comparison means 
for comparing the register holding two or more task patterns, and the task pattern of the 
PE concerned with a global-area task pattern and instructions. The result of this 
comparison is used in order to be used in order to choose the program starting point in the 
PE concerned, or to make the PE concerned into an idle state. However, this United 
States patent has not suggested this about the autonomy in PE in a SIMD array like this 
invention. The above-mentioned comparison means and its comparison result can be used 
in order to classify two or more PE for various parallel tasks or to carry out grouping. On 
the other hand, this invention carries out elaboration of saying [ a classification or 
carrying out grouping, and how two or more pickets in the SIMD environment are 
permitted in these very thing ]. this effect is in separating some two or more pickets and 



making this into an activity target, while the section of 1 of the SIMD code is performed. 
In this way, the computer of this invention can run so that other groups may be started for 

processing. 

[0033]US,4967340,B has described the systolic array processor. Each processing element 
in this array processor comprises two registers, an adding machine, a multiplier, and three 
programmable switches. Since this array processor is constituted, the switch in each of 
that processing element is set up by a controller. Then, data is sent in through these 
processing elements, in order to create the result of a request. This United States patent 
has not suggested the system concerning this invention. 

[0034]US,5005120,B means that two or more processors exist, and is aimed at the array 
processor. However, this United States patent has only described the time compensating 
circuit used when aligning data in the signal array processor of a bit serial. Each 
processing element of this array processor comprises four bit serial registers of the place 
which supplies data to ALU. The time compensating circuit is placed in front of the 1st 
register. This array processor is completely unrelated to a SIMD computer. 
[0035]US,5020059,B reconstructs an array processor so that defective PE may be 
removed, and it starts the generalized interconnection system for the array processors of 
the place which realizes a tree and other topology within a fundamental two-dimensional 
mesh. The description or suggestion concerning array control architecture, and grouping 
or the arbitrary sides of PE is not performed in this United States patent, therefore 
description about the autonomy in PE is not performed to it, either. However, in the 
SIMD computer of this invention, the operation same as usual completely is performed in 
each picket in an array processor. Forbidding one or more pickets' local autonomy 
selectively (disable), and permitting it (enabling) is a standpoint which should also be 
called foundation of this invention. Although there were all the efforts of the above 
conventional technologies, performing SIMD, MIMD, and a SIMIMD process is not 
realized by one computer until now except for the patent application to which the above- 
shown relates. The mechanism needed for a SIMIMD process, and a floating point 
arithmetic and others is not actually developed fully in the technical field concerned. 
[0036]US,5045995,B was based on the data condition in each PE, and has described the 
mechanism for permitting or forbidding the function of each PE of a SIMD array. If the 
global-area command of 1 is published by all the PE, each PE will pass the bit of a status 
register, and will permit or forbid the PE concerned itself while it carries out the sample 
of the conditions of 1 in the inside. The state of such PE is made to exchange other 
global-area commands effectively. This function can be used in order to realize the 
structure of IF/THEN/ELSE and WHILE/DO. In order to support nested terms of the 
license, the stack of this state can be carried out. Thus, this United States patent is related 
to permission/prohibition function of this invention. However, this United States patent 
needs the following three conditions. 

1 . Initial test, loading of status bit, permission/prohibition based on status bit. 

2. Command for flipping all the permission/prohibition bits so that set of everything but 

PE may be permitted. 

3. Memory storage for providing nesting. The above-mentioned function of this United 
States patent is complicated superfluously. On the other hand, the mechanism of this 
invention can permit / forbid many functions, without combining said all three conditions 
of this United States patent. Although this invention does not need the flip function 



described by this United States patent, it can provide wide range capability to a system 
rather than this United States patent provides. The SIMD computer of this invention does 
not need combination with the function of IF and an ELSE instruction, and other features 
concerning video processing at all. 

[0037]generally, according to target technology [ in the technical field concerned / the 
request and this invention ], the parallel array processor which comprises 100-1000 
pickets (PE) is described by the patent application to which the above-shown relates. 
There are many Reasons in carrying out grouping of two or more pickets by a certain 
method so that it can process by one group or two or more groups as whom the picket 
was chosen. For example, when the array processor includes two or more various jobs or 
contains two or more very various portions also among the same jobs, the time of not 
only becoming very troublesome [ this selection process ] but many will be wasted. For 
example, a part of geometric sine/cosine portion in question may need to process SIN(x), 
and other portions of another side and this problem may need to process COS(x). 
Therefore, while having calculated COS by changing into an inactive state the group who 
calculates COS while having calculated SIN, and ranking second, changing into an 
inactive state the group who calculates SIN is performed. 

[0038]However, recognition of that the value in SIN or a COS group is a result of the 
angle of just calculated one will produce difficulty. In this way, it is necessary to make 
very dynamic assignment in SIN or a COS group. So that two or more pickets may 
belong to two or more groups and these pickets may belong to a momentarily different 
group. About these pickets, it will become much more efficient to re-assign these very 
thing in a dynamic mode, since each state may be changed dynamically. 
[0039]in order to know in real time which picket each group's re-assignment is made to 
perform, or belongs to which group, the knowledge of this invention concerning this 
point obtains that an array controller should not be used, and is a thing. 
[0040] 

[Problem to be solved by the invention] Therefore, the purpose of this invention is to 
provide the mechanism for performing effectively two or more pickets' (PE)'s dynamic 
and autonomous grouping. 
[0041] 

[Means for solving problem]If this invention relates to providing the mechanism for 
carrying out grouping of two or more pickets in a SIMD/MIMD array and also it explains 
in detail, It starts carrying out grouping of two or more pickets (PE: processing element) 
in a SIMD computer during SIMD of 1, or execution of a SIMIMD program. Although 
many sides of the desirable embodiment are described by the patent application to which 
the above-shown relates. The improvement point of this invention as an array function 
performed in parallel by all the active pickets in an array processor. In order to choose the 
structure for assigning two or more groups two or more pickets, and some pickets who 
should perform calculation peculiar to the group problem of 1, the mechanism for using 
grouping is included. 

[0042]These improvement points are attained by providing an array processor with the 
mechanism for carrying out grouping of two or more pickets in a SIMD/MIMD array. 
This array processor has an array controller of 1, and two or more pickets who can 
function by SIMD operational mode. Each picket has ALU of 1, two or more registers, 
and a local memory of 1, and interconnection is carried out to other pickets during 



operation of this array processor. Each picket can assign one or more groups the picket 
concerned itself dynamically, in order to process data individually in two or more pickets 
who belong to the group of 1. For every clock cycle, all the pickets in this array processor 
receive instructions of 1 from an array controller, and perform these instructions. In this 
case, it can be interpreted as a certain instructions generating different operation within 
each picket, and each picket can operate by SIMIMD operational mode during such 
operation. The original function in the picket of 1 forbids that it should permit that the 
picket concerned participates in processing of the SIMD instruction stream of 1 as a 
result of the SIMD instructions from an array controller, or the picket concerned should 
participate in processing of the SIMD instruction stream concerned. 
[0043]This array processor has a mechanism for local autonomy, the douse (dormancy) 
mode for saving a picket's internal state, and a douse latch. This douse latch can change 
by the calculation result in the picket of 1. The picket calculation which can change this 
douse latch includes the LOAD/SET/RESET command for every data read in a picket's 
local memory for every data calculated. 

[0044]According to this invention, a picket's condition is saved by not permitting storing 
in the local memory in the picket concerned. Other calculations including reading and 
arithmetic calculation of the local memory in a picket can continue this, as only an 
effective result is moved to said douse latch. 

[0045]As a result of this grouping, the group of two or more pickets in an array processor 
is divided by the mold of the problem which these groups include. Each picket has a 
means for assigning the picket itself [ concerned ] to one or more groups who are 
working about the problem of 1 . 

[0046]By creating the new chip and system which were designed according to the new 
concept of this invention, a new technique for building a super parallel computer and 
other computers is provided. This invention is turned to such a system. 
[0047]The picket processor and the advanced parallel array processor (APAP) are 
described by this Description. It has an interesting picket processor that it is available in 
PME (processor memory element) of 1. It is the application for the military affairs of the 
place where it asks for a very small array processor that especially a picket processor is 
useful. In this relation, the picket processor differs from the embodiment of this invention 
relevant to an advanced parallel array processor (APAP) a little. However, since 
similarity exists among both processors, many sides and many features which are 
provided according to this invention can be used in both these processors. 
[0048JA picket is equivalent to 1/n of the elements of the array processor of 1 of the 
place formed from processor elements, a memory element, and an interconnection 
element. This picket concept is applicable to 1/n of APAP. 

[0049]If a picket processor is compared with APAP, although both processors may be 
different in respect of the width of data, the size of a memory, and the number of 

registers, It differs to comprising real original form voice of the super parallel computer 
which is substitution of APAP so that the former may have the connectivity 
(connectivity) to 1/n of a regular array in that it is said that PME in the latter APAP is a 
part of sub array. Both systems can perform SIMIMD. However, since the picket 
processor is constituted as a SIMD computer which has MIMD type PE, APAP 
constituted as an MIMD computer can perform SIMIMD by using MIMD type PE 
controlled to emulate SIMD to the ability to perform SIMIMD directly. Both computers 



use PME. 

[0050]Both systems can be constituted as a parallel array processor of 1 of the place 
which comprises the interconnection network for carrying out interconnection of n PE 
and such PE. In this case, 1/n of the array processor concerned comprises PE and the 
related memory of 1, the control bus interface of 1, and said a part of interconnection 
network. 

[0051]Since it has double operational mode, it can be ordered this parallel array processor 
to that handling unit so that it may operate in which mode between two modes for SIMD 
operation and MIMD operation and may shift freely between these two modes. That is, to 
each PE, when the mode for SIMD operation is chosen, it can be ordered a handling unit 
so that a command of itself may be executed in SIMIMD mode. On the other hand, when 
the mode for MIMD operation is chosen, the handling unit can synchronize that selected 
PE simulates execution of MIMD. This is called MIMD-SIMD. 
[0052]The interconnection network with which the parallel array processor in both 
systems is provided has a course for delivering information between PE. There are two 
kinds of methods about movement of information. In the 1 st method, in order that the 
data under movement may not define the address, an array controller orders so that all the 
messages may move in the same direction simultaneously. Self-routing of each message 
is performed according to the address which the header in the start part of each message 
defines by the 2nd method. 

[0053]The segment of this parallel array processor has two or more copies of the 
handling unit provided on a single semiconductor chip. In order that each segment may 
extend said interconnection network with a part of interconnection network relevant to 
the segment concerned, a buffer, and a multiplexer. The control section for making it 
possible to connect the segment concerned with other segments without a joint (seamless) 
is included. 

[0054]The control bus from a controller is formed for every handling unit. This control 
bus has extended in each PE, and controls that operation. 

[0055]Each processing element segment of a parallel array processor. In order to support 
communication of the control to the array segment which has two or more copies of the 
processing memory element included within the limits of a single semiconductor chip, 
and is contained in the chip concerned, the part and register buffer of the array control 
bus are included. 

[0056]Both can perform mesh movement or routed movement. Usually, APAP realizes 
double interconnection structure. That is, on the other hand, two or more semiconductor 
chips are mutually related on the other hand mutually relating with eight PE (or PME) on 
a semiconductor chip. Although programmable routing on a chip makes the link between 
PE (or PME) establish as mentioned above generally, two or more nodes are connected 
by other methods. The usual topology on a chip is a mesh of 2x4, and interconnection of 
the node in this case can be made finishing [ routing ]. Since both systems have an 
interconnection network between PE (or PME), they make it possible for the matrix of 1 
to comprise two or more point-to-point networks. 
[0057] 

[Working exampleJThe improvement point of this invention for carrying out grouping of 
two or more pickets in the SIMD computer of one during SIMD of 1 or execution of a 
SIMID program is shown in drawing 1 . If it explains much more concretely, the 



processing element (replicatable) in [ of the place generally called PE ] an array processor 
which can be reproduced is shown in drawing 1 . The element in which this duplicate is 
possible contains the means for carrying out interconnection of the processor elements 
(register of ALU+ plurality) of 1, the local memory of 1, and the processing element in 
which the duplicate concerned is possible to the processing element in an array processor 
in which other duplicates are possible. The processing element in which these duplicates 
are possible is controlled by the array controller of 1 in SIMD mode. This array controller 
is for controlling the function of the processing element in which 1 set of duplicates of 
the place which constitutes an array processor are possible. Each of the processing 
element which can be reproduced is a node of 1 of an array processor. In some systems, 
each node comprises the group of two or more chips. In the desirable embodiment of this 
invention, each node can be considered as what comprises 1 set of pickets as described by 
the patent application to which the above-shown relates. Each picket is one of 1 set of 
processing elements carried on the single chip. Although 1 set of these processing 
elements can receive common control in SIMD mode, they can function also in MIMD 
mode. In the system which is not so advanced, 1 set of these processing elements can be 
used as individual processor elements (ALU, a register, a memory, I/O). In that case, it 
can set, and these individual processor elements can be carried on the single chip 
provided with the external communication means, or it can reproduce as a respectively 
independent chip element. It is a case as 1 set of processing elements are carried on the 
single chip that especially the thing using the concept of the grouping of this invention is 
advantageous. This invention can be used in a super parallel array processor. 
[005 8] Although the patent application to which the above-shown relates has described 
many sides of a desirable embodiment, The improvement point of this invention as an 
array function performed in parallel by all the active pickets in an array processor. In 
order to choose the structure for assigning two or more groups two or more pickets, and 
some pickets who should perform calculation peculiar to the group problem of 1, the 
mechanism for using grouping is included. The concept for controlling two or more 
pickets' [ picket / each ] grouping is shown in drawing 1 . In the memory location of 1 
assigned to each group, in order to direct the intervention to groups involved, each picket 
sets or resets the bit of 1. As for this memory location, it is preferred that it is a part of 
local memory of the place directly connected with an individual picket. Although it is 
preferred to form some pickets which can be reproduced as for this memory location, in a 
system which assigns a part of shared memory as a global-area memory, and assigns 
other parts as a local memory, the latter local memory portion is equivalent to this 
memory location. The group of 1 comprises all the pickets who have specific similarity at 
first. Although this invention uses two or more pickets as a grouping plug by a certain 
method, there are many Reasons. One example is a case as the array processor includes 
two or more various jobs or two or more very various portions are included also among 
the same jobs. 

[0059]Two or more pickets can assign one or more groups in some groups these very 
thing, and can advance processing based on such grouping. As there are many pickets 
who are calculating at a certain time, it is more desirable, but about some operations, it is 
necessary to work using a picket's subset group. Local autonomy is a tool for doing this 
work, and already explained this. 

[0060]Next, many pickets explain the interesting method of making these very thing 



belong in many groups, without an array controller needing to get to know which picket 
belongs to which group. 

[0061]Though the SIMD computer has advanced local autonomy, it needs to work about 
1 set of problems of the same kind at a certain time. While the active (intervention) picket 
in an array processor is working about the problem of 1, the inactive picket (un- 
participating) of the array processor concerned is placed by the douse state. Each picket 
placed by Dawes Mohd has the activity restricted so that the problem and data in the 
picket concerned may not be blocked. Although the picket placed by Dawes Mohd can 
still read the local memory and can perform almost all operations that other pickets 
moreover perform, he cannot perform storing in the local memory or register. However, a 
douse bit can be stored now in a status word as the exception. In this way, a douse bit can 
be loaded, in order for the picket of 1 to reboot the picket concerned while calculating a 
new state (recovery). 

[0062]There is the important feature of grouping of having followed this invention in the 
picket concerned making a decision about whether each picket belongs to the group or 
two or more groups of 1. Therefore, this assignment and a re-allocation process are the 
parallel operation of 1, and many pickets can perform the same operation as this in 
parallel. An array controller does not have the required skill which gets to know which 
picket belongs to which group. However, when you need such information, the array 
controller can read the state from each picket. In this way, the function of this array 
controller is arranged in the register relevant to each picket of the place "execution/douse 
state [ place ]" is written in drawing 1 . This register is provided in the inside of the picket 
concerned, and can read those contents with an array controller. 
[0063]In order to establish two or more separate groups, the memory location of 1 is 
reserved for every group inside each picket. The memory location "xxl" in the inside of 
all the pickets holds the douse control bit corresponding to the group 1 as shown in 
drawing 1 . A calculated value which is loaded to this douse bit is first stored in the 
memory location corresponding to a suitable group in each picket. Subsequently, this 
value is put in into a douse bit so that the condition of the picket concerned may be 
changed. According to this invention, in order to identify and set up some groups in 
advance of individual processing, a suitable binary pattern (1/0) is loaded to the memory 
location corresponding to these groups. What is necessary is just to load the value to the 
group of 1, while investigating the carry output and/or zero conditions of operation of 
one, in order to identify the group of 1. In order to set up many groups, there are the 
following methods. 

[0064]- Carry out grouping of all the pickets of the place which has a carry output of 1. - 
The contents of the selected memory location carry out grouping of all the pickets of the 
place which is positive. - The contents of selected memory location ** carry out grouping 
of all the pickets of a place equal to a specific simultaneous transmissive communication 

value. 

[0065]If the picket of 1 belongs to the group 2, the memory location "xx2" in the picket 
concerned will hold the logic 1, otherwise will hold the logic 0. In order to change a 
douse bit, three instructions (LOAD/SET/RESET) can be used. The 1st instructions 
"LOADDOZE" make an active group change into the group who comprises all the 
pickets who hold the logic 1 in each memory location (for example, xx3). Although it 
was an ON state before, a picket who holds the logic 0 to the memory location "xx3" 



starts to turn off. The 2nd instructions "SET DOZE" add all the pickets who hold the 
logic 1 in each memory location (for example, xx4) to an active group. In this case, any 
pickets do not start to turn off The 3rd instructions "RESET DOZE" remove all the 
pickets who hold the logic 1 in each memory location (for example, xx4) from an active 
group. 

[0066]If these instructions, i.e., LOAD, SET (AND), and RESET (OR) are used, based 
on the existing group's logical relation, the logic function for building a new group can be 
used effectively. The new group of 1 can be built using aggregate theory by merging two 
or more existing groups logically within this picket data flow. 

[0067]The array controller knows two or more groups' existence until it results in this 
point. It is because this array controller is involved in processing with these groups and 
active processing has been made to change among groups, or [ however, / which picket 
belongs to which group, or that the actually specific group of this array controller is 

empty ] (there is no picket) ~ or [ no ] ~ ******** — it does not know at all. 
[0068]This array controller can read the information on an individual picket by using 
other functions if needed. There is a result of these functions in giving the target picket's 
address. Below, three examples are shown. 

- Find the picket who has the newest value within a predetermined group. 

- Find the picket who has the minimum address within a predetermined group. 

- Find the picket who has the value which is approaching most mutually within a 
predetermined group. 

[0069]In each of these cases, the picket who has a desired value must be separated by [ an 
array controller ] executing a series of commands. In order to separate a desired picket, 
the various commands between pickets can be used. He is the only picket from whom the 
desired picket has still awoke at this time. The array controller can require the address, 
and can read data in the local memory, or can load data to the local memory. 
[0070]Also in other modes, such as SIMIMD, the grouping function in an array processor 
serves as a powerful tool only in SIMD. 
[0071] 

[Effect of the InventionJAs mentioned above, according to this invention, the mechanism 
for performing effectively two or more pickets' (PE)'s dynamic and autonomous grouping 
can be provided. 

TECHNICAL FIELD 



[Industrial ApplicationJThis invention relates to the mechanism for carrying out grouping 
of many pickets in a SIMD/MIMD array processor, in order to execute a program by an 
array processor, if an array processor including two or more pickets is started and also it 
explains in detail. 

[0002]The parallel associative processor system of the place which comprises 100 thru/or 
1000 processing elements is described by related patent application U.S. Serial Number 
61 1594 (it corresponds to Tokuganhei3-278900). However, to this patent application, the 
improvement point of this invention including a douse latch is not indicated at all. 
[0003] Terminological explanation and ALUALU are the arithmetic operation parts of 
each processing element. 

[0004]- An array array is the arrangement of the element in one dimension or the 



dimension beyond it. If it carries out from a viewpoint of the hardware of a super parallel 
computer, usually it is a set of the structure (processing element) which has the same 
composition as an array. When performing parallel operation (operation) of data, each 
processing element can perform these operations respectively independently and in 
parallel if needed, when operation is assigned to each. Generally, an array can be 
considered as a lattice of a processing element. 

[0005]- An array controller array controller is a unit programmed as a controller for array 
processors. An array controller performs the function of the master controller for carrying 
out grouping of the processing element arranged in an array processor. 
[0006]- There are a multi-instruction multi data stream (MIMD) and a single instruction 
multi data stream (SIMD) in the two main architecture of an array processor array 
processor. In an MIMD array processor, each of that processing element performs the 
unique instruction stream of itself about the data of itself. On the other hand, although 
each of that processing element is restricted to the same command from a common 
instruction stream in the SIMD array processor, the data relevant to each processing 
element is unique. The suitable array processor of this invention is called an advanced 
parallel array processor (Advanced ParallelArray Processor:APAP), and has other 
features. 

[0007]- A controller controller is a device which orders it transmission of data and a 
command via the link of an interconnection network. The operation is controlled by the 
program which is controlled by the program executed by the processor to which the 
controller concerned is connected, or is executed within the controller concerned. 
[0008]- A link link is a physical or logical element. A physical link is a physical 
connection for combining two or more processing elements. 
[0009]- In order to perform two or more data streams which MIMDMIMD is the 
architecture of an array processor and are positioned one [ at a time ] for every processing 
element. The architecture of the place which each processing element in an array 
processor has an instruction stream of itself, therefore has a multiple instruction stream as 
a whole is meant. 

[0010]- A module module is a functional unit of the hardware designed so that it might be 
individualized and might be used with an identifiable program unit or other components. 
A set of the processing element contained in a single electronic chip is also called a 

module. 

[001 1]- A node is a node of many links at a general node. In the array of a common 
processing element (PE), one PE becomes one node. Each node may include a set of PE 
called a module. It is preferred to form each node from eight processor memory elements 
(PME) in a typical example. 

[0012]- Since a set of the module which comprises the node array PME is called a node 
array, it is an array of the node which comprises two or more modules. There are usually 

more node arrays of 1 than some PME(s). 

[0013]- A picket picket is a component of an array processor. A picket is equivalent to 
1/n of the array processor of 1, and takes the form of PME. The processor logic designed 
according to the PME chip of this invention can have the logic for array processors which 
realized picket logic described by the patent application to which the above-shown 
relates, or was formed as a node of 1. the term of a picket is similar with the term 
"processing element (PE)" regularly used in the field of the array processor. As for the 



processing element in the array processor equivalent to a picket, it is preferred to 
constitute from the processor elements and the local memory which were together put so 
that two or more information bytes might be processed to a bit parallel between one clock 
cycles. The picket in a typical example comprises a means for carrying out 
interconnection of the picket concerned to the data flow processor (ALU+ register) of 1 
byte width, the local memory which has the capacity of 32 K bytes or more, and primitive 
control with other pickets. 

[0014]- With a picket processor picket processor. It is the total system provided with a 
picket's array, the interconnection network, the I/O system, and the SIMD controller that 
comprises the micro controller which operates a microprocessor, a storing routine 
processor, and the array concerned. 

[0015]- The picket architecture picket architecture is a desirable embodiment of the 
SIMD architecture of the place which has the feature which is adapted for the problem of 
some various kinds. These problems contain the following. 

- A set associative processing-parallel numerical processing-picture and the term of 
processing and PMEPME of a similar physical array are used as what means a "processor 
memory element." PME of 1 involutes the picket of one. PME of 1 is 1/n of the array 
processor of one, and comprises the processor elements and the related memory element 
of 1, a control interface, and a part of interconnection network. Like [ in the case of a 
picket processor ], PME can have connectivity (connectivity) with a regular array, or can 
have a part of connectivity of a sub array like [ in the case of the node of PME ]. 
[0016]- Routing routing is assigning a physical course until the message of 1 reaches the 
address. Data resources (sending agency) and an address relate to assignment of routing. 
These elements or addresses have a temporary relation or similarity. Routing of a 
message is due to the key of the place obtained by referring to a quota table in many 
cases. The addresses in a network are arbitrary processing elements in which an address 
is possible, and an address is carried out as an address of the information transmitted by 
the path control address which identifies the link. The destination field of a message 
header identify an applicable address. 

[0017]- As [ order / from a single instruction stream / so that SIMD SIMD may be the 
architecture of an array processor and every one processing elements / all / of those per 
processing element may perform two or more data streams currently assigned ] 
[0018]- SIMD/MIMDSIMD/MIMD is a term showing the computer which has a double 
function which can change from MtMD to SIMD, therefore has double operational mode 
between a certain periods, in order to process some complicated commands. When the 
super parallel computer "connection machine 2 (CM-2)" of a sinking machine company 
is arranged as the input edge or outgoing end of an MIMD computer, the programmer can 
operate double operational mode, in order to perform the portion from which the problem 
of 1 differs. The bus which carries out interconnection of the master control processor to 
other processors is used for these computers. This master control processor has the 
capability to interrupt processing of other processors. Other processors can execute an 
independent program code. The means for performing a checkpoint (the present state of a 
controlled object processor is closed and saved) must be provided during interruption. 
[0019]- As [ order / from a single instruction stream / so that SIMIMDSIMIMD may be 
the architecture of an array processor and every one processing elements / all / of those 
per processing element may perform two or more data streams positioned ] At this 



architecture, the data subordinate operation in each picket of the place imitating 
execution of a command is controlled by a SIMD instruction stream. A SIMIMD 
computer is a single instruction-stream computer provided with the capability for setting 
a multiple instruction stream in order using a SIMD instruction stream (every one per 
picket), and processing two or more data streams (every one per picket). SIMIMD can be 
performed by a PME system. 

[0020]- The synchronous operation in a synchronous operation MIMD computer is the 
operational mode that each action is connected with the phenomenon of 1. Although it is 
common that it is a clock as for this phenomenon, a specific phenomenon which is 
regularly produced within the program sequence of 1 may be sufficient as it. If operation 
of 1 is dispatched to two or more PE, such PE will advance so that the function may be 
performed independently. Control is not returned to a controller until this operation is 
completed. After the demand to many processing elements in the array concerned is 
generated by the controller to the array of a processing element if [ demand ], this 
processing element must complete each operation, before control is returned to a 
controller. 



TECHNICAL FIELD 



[Industrial ApplicationJThis invention relates to the mechanism for carrying out grouping 
of many pickets in a SIMD/MIMD array processor, in order to execute a program by an 
array processor, if an array processor including two or more pickets is started and also it 
explains in detail. 

[0002]The parallel associative processor system of the place which comprises 100 thru/or 
1000 processing elements is described by related patent application U .S. Serial Number 
61 1594 (it corresponds to Tokuganhei3-278900). However, to this patent application, the 
improvement point of this invention including a douse latch is not indicated at all. 
[0003] Terminological explanation and ALUALU are the arithmetic operation parts of 
each processing element. 

[0004]- An array array is the arrangement of the element in one dimension or the 
dimension beyond it. If it carries out from a viewpoint of the hardware of a super parallel 
computer, usually it is a set of the structure (processing element) which has the same 
composition as an array. When performing parallel operation (operation) of data, each 
processing element can perform these operations respectively independently and in 
parallel if needed, when operation is assigned to each. Generally, an array can be 
considered as a lattice of a processing element. 

[0005]- An array controller array controller is a unit programmed as a controller for array 
processors. An array controller performs the function of the master controller for carrying 
out grouping of the processing element arranged in an array processor. 
[0006]- There are a multi-instruction multi data stream (MIMD) and a single instruction 



multi data stream (SIMD) in the two main architecture of an array processor array 
processor. In an MIMD array processor, each of that processing element performs the 
unique instruction stream of itself about the data of itself On the other hand, although 
each of that processing element is restricted to the same command from a common 
instruction stream in the SIMD array processor, the data relevant to each processing 
element is unique. The suitable array processor of this invention is called an advanced 
parallel array processor (Advanced ParallelArray Processor:APAP), and has other 
features. 

[0007]- A controller controller is a device which orders it transmission of data and a 
command via the link of an interconnection network. The operation is controlled by the 
program which is controlled by the program executed by the processor to which the 
controller concerned is connected, or is executed within the controller concerned. 
[0008]- A link link is a physical or logical element. A physical link is a physical 
connection for combining two or more processing elements. 
[0009]- In order to perform two or more data streams which MIMDMIMD is the 
architecture of an array processor and are positioned one [ at a time ] for every processing 
element. The architecture of the place which each processing element in an array 
processor has an instruction stream of itself, therefore has a multiple instruction stream as 
a whole is meant. 

[0010]- A module module is a functional unit of the hardware designed so that it might be 
individualized and might be used with an identifiable program unit or other components. 

A set of the processing element contained in a single electronic chip is also called a 
module. 

[001 1]- A node is a node of many links at a general node. In the array of a common 
processing element (PE), one PE becomes one node. Each node may include a set of PE 
called a module. It is preferred to form each node from eight processor memory elements 
(PME) in a typical example. 

[0012]- Since a set of the module which comprises the node array PME is called a node 
array, it is an array of the node which comprises two or more modules. There are usually 
more node arrays of 1 than some PME(s). 

[0013]- A picket picket is a component of an array processor. A picket is equivalent to 
1/n of the array processor of 1, and takes the form of PME. The processor logic designed 
according to the PME chip of this invention can have the logic for array processors which 
realized picket logic described by the patent application to which the above-shown 
relates, or was formed as a node of 1 . the term of a picket is similar with the term 
"processing element (PE)" regularly used in the field of the array processor. As for the 
processing element in the array processor equivalent to a picket, it is preferred to 
constitute from the processor elements and the local memory which were together put so 
that two or more information bytes might be processed to a bit parallel between one clock 
cycles. The picket in a typical example comprises a means for carrying out 
interconnection of the picket concerned to the data flow processor (ALU+ register) of 1 
byte width, the local memory which has the capacity of 32 K bytes or more, and primitive 
control with other pickets. 

[0014]- With a picket processor picket processor. It is the total system provided with a 
picket's array, the interconnection network, the I/O system, and the SIMD controller that 
comprises the micro controller which operates a microprocessor, a storing routine 



processor, and the array concerned. 

[0015]- The picket architecture picket architecture is a desirable embodiment of the 
SIMD architecture of the place which has the feature which is adapted for the problem of 
some various kinds. These problems contain the following. 

- A set associative processing-parallel numerical processing-picture and the term of 
processing and PMEPME of a similar physical array are used as what means a "processor 
memory element." PME of 1 involutes the picket of one. PME of 1 is 1/n of the array 
processor of one, and comprises the processor elements and the related memory element 
of 1, a control interface, and a part of interconnection network. Like [ in the case of a 
picket processor ], PME can have connectivity (connectivity) with a regular array, or can 
have a part of connectivity of a sub array like [ in the case of the node of PME ]. 
[0016]- Routing routing is assigning a physical course until the message of 1 reaches the 
address. Data resources (sending agency) and an address relate to assignment of routing. 
These elements or addresses have a temporary relation or similarity. Routing of a 
message is due to the key of the place obtained by referring to a quota table in many 
cases. The addresses in a network are arbitrary processing elements in which an address 
is possible, and an address is carried out as an address of the information transmitted by 
the path control address which identifies the link. The destination field of a message 
header identify an applicable address. 

[0017]- As [ order / from a single instruction stream / so that SIMDSIMD may be the 
architecture of an array processor and every one processing elements / all / of those per 
processing element may perform two or more data streams currently assigned ] 
[0018]- SIMD/MIMDSLMD/MIMD is a term showing the computer which has a double 
function which can change from MIMD to SIMD, therefore has double operational mode 
between a certain periods, in order to process some complicated commands. When the 
super parallel computer "connection machine 2 (CM-2)" of a sinking machine company 
is arranged as the input edge or outgoing end of an MIMD computer, the programmer can 
operate double operational mode, in order to perform the portion from which the problem 
of 1 differs. The bus which carries out interconnection of the master control processor to 
other processors is used for these computers. This master control processor has the 
capability to interrupt processing of other processors. Other processors can execute an 
independent program code. The means for performing a checkpoint (the present state of a 
controlled object processor is closed and saved) must be provided during interruption. 
[0019]- As [ order / from a single instruction stream / so that SIMIMDSIMIMD may be 
the architecture of an array processor and every one processing elements / all / of those 
per processing element may perform two or more data streams positioned ] At this 
architecture, the data subordinate operation in each picket of the place imitating 
execution of a command is controlled by a SIMD instruction stream. A SIMIMD 
computer is a single instruction-stream computer provided with the capability for setting 
a multiple instruction stream in order using a SIMD instruction stream (every one per 
picket), and processing two or more data streams (every one per picket). SIMIMD can be 
performed by a PME system. 

[0020]- The synchronous operation in a synchronous operation MIMD computer is the 
operational mode that each action is connected with the phenomenon of 1. Although it is 
common that it is a clock as for this phenomenon, a specific phenomenon which is 
regularly produced within the program sequence of 1 may be sufficient as it. If operation 



of 1 is dispatched to two or more PE, such PE will advance so that the function may be 
performed independently. Control is not returned to a controller until this operation is 
completed. After the demand to many processing elements in the array concerned is 
generated by the controller to the array of a processing element if [ demand ], this 
processing element must complete each operation, before control is returned to a 
controller. 



PRIOR ART 



[Description of the Prior ArtJIn research without the end of a quicker computer, engineers 
have come to build a super parallel computer by linking the microprocessor of the low 
cost of hundreds - 1000 numbers in parallel, in order to conquer the complicated problem 
on which today's computer is perplexed. This invention relates to a new technique for 
building a super parallel computer. Many improvement points of this invention should be 
considered by making conventional technology into a background. In order to have 
chosen the architecture which conforms to specific application most, the trade-off on a 
system was needed, but there was no solution which can be satisfied until now. There is a 
target of this invention in offering solution still easier. That is, this invention relates to the 
array processor which can realize SIMD and MIMD operation. 

[0022]Hereafter, various United States patents related to a SIMD computer are outlined. 
In order to execute a program with the mechanism according to this invention, i.e., an 
array processor, the mechanism for carrying out grouping of many pickets in SIMD and 
an MIMD array is not indicated or suggested to these United States patents. 
[0023JUS, 4783738,8 is turned to many sides of autonomy (autonomy). 
If a SIMD controller publishes instructions of 1 to all the processing elements (PE) in an 
array processor, as a result of that each PE is spatial or the data subordinate characteristic, 
it can change the bit in these instructions, or, specifically, can insert a bit into these 
instructions. 

Generation of ADD/SUB, SEND/RECEIVE, and OPA/OPB is included as an example. 
This function is used for carrying out Image Processing Division of two or more lines, 
and demarcating the boundary of a picture. Although this United States patent is similar 
to one of two or more autonomous functions of the place which can be used in relation to 
the grouping of this invention (ALU is made to perform data subordinate operation in this 
invention), this invention does not have intention of change or insertion of a bit of 
operation like this United States patent. This invention does not take into consideration 
that an ALU function is a specific function of the space position in an array processor. 
Although it is similar with a certain thing of the mechanisms of this invention in that this 
United States patent starts data subfunction. The data reference matters (a mark or 
condition code) of the instruction sequence of 1 only force ALU, and he is trying for this 
invention to make some things of other perform by realizing a DWIM (Do What I Mean: 
perform that you wants) function. 

[0024]US,4736291,B has described the array conversion processor of the place which 



performs high speed processing of a data array. This United States patent is optimized so 
that the FFT (Fast Fourier Transform) algorithm in the field of earthquake analysis may 
be performed. The center of this United States patent is a system control bus shared by 
the bulk memory and a maximum of 15 devices. Each device has a control storage which 
can be written in, program memory, a control unit, and a device subordinate unit that 
provides the characteristic respectively peculiar to each of 15 devices. On the other hand, 
itself is not necessarily a parallel array processor, and this array conversion processor is 
not necessarily mentioned about this with this United States patent. Array conversion can 
be performed also by the system of this invention. However, the processor described by 
this United States patent is a complicated and repetitive order, and is provided with some 
subunits (stage) which process a data array. While two or more processing elements take 
out two or more elements of a data array, respectively, he is trying to process data in 
parallel in these processing elements by the SIMD array processor of this invention in 
contrast with this United States patent. 

[0025]US,4831519,B so that each may connect two or more processing elements (PE) of 
each other which have 16 bit width and it can be effectively adapted in various data 
formats. The SIMD array processor provided with interprocessor connection which is 
prolonged on the left-hand side and right-hand side from each PE is described. For 
example, when processing a 64-bit floating point word by four PE, among such PE by 
one PE (higher rank). The exponent part (16 bits) of this floating point word is processed, 
and the decimal fraction (48 bits) of this floating point word is processed by the three 
remaining PE connected mutually. Control of a carry /loan is mutually combinable 
between PE, in order to attain this. 16 PE for data processing, two PE for address 
generations, and two spare PE can be carried in one chip. I/O of this United States patent 
makes it possible to generate one of four kinds of voltage depending on the logic 
conditions of these two lines while it combines two logic signals to one pin by using the 
signal system of four levels. Although the device of this United States patent must 
provide a controUership function, this is hardly described. In order to act on the data of 
various sizes, the global area (global) MASK which defines whether what we do with the 
grouping of two or more PE, and the partial NEST control spread from master PE to 
slave PE of that right-hand side are supplied to the array processor of this United States 
patent. However, determining to which group PE of I belongs is not described at all by 
this United States patent. The local autonomy in the meaning that arbitrary processings 
are performed based on the data under not NEST control but processing is not described 
at all, either. By combining two or more PE which adjoins mutually, this United States 
patent has described the possible design for a horizontally extensible SIMD chip so that it 
may operate as a single processor for processing the data of 16, 32, and 48 bit width. 
However, the point how two or more processing elements are combinable in this United 
States patent, Two or more PE which the place which operates as a single processor 
adjoins mutually. About the point how the data of 16, 32, or 48 bit width can be 
processed, and the point how to know whether the processing element of 1 should 
participate in predetermined processing, or to determine, it is not described or suggested 
at all. This United States patent has the grouping ordered by the global-area control 
MASK. However, local autonomy is lacking in this United States patent. After 
understanding this invention, this will become clear if this United States patent is re- 
evaluated. 



[0026]US,4783782,B has described the test at the time of the manufacture of a SIMD 
chip for separating a maximum of two defective PE in a SIMD array described by 
US,483 151 9,B shown above. Error data is stored in the PROM section on this chip. After 
reading this error data by a controller, the resources on this chip can be assigned 
dynamically. Thus, the chip of this United States patent only has the autonomy to which 
the processor was restricted. 

[0027]US,4748585,B has described the mechanism for assigning the various elements of 
a concurrent processor to many segments so that it may be adapted for the data of various 
length. Although this United States patent is also similar to US,4831519,B shown above, 
this United States patent requires it for combining two or more uniprocessors mutually 
and carrying out grouping of this, in order to process a word larger than the width of each 
uniprocessor. Each uniprocessor is completed in that it has a microsequencer, ALU, a 
register, etc. The feature of this United States patent has some uniprocessors in the point 
that it can operate in the mode which is not mutually rigid, in order to process a wide data 
word. Control of segmentation is performed by the global-area control which used a 
combination code and global-area condition code. As an MIMD array, this device cannot 
provide capability of a SIMD array. On the other hand, this invention relates to 
connecting in the mode which had two or more pickets improved rather than relates to the 
control for the MIMD arrays for combining two or more uniprocessors mutually like this 
United States patent, and building a still wider processor. 

[0028]US,4825359,B has described the processor for performing processing of a data 
array, for example, the Fast Fourier Transform, (FFT). This processor contains some 
processing operators (operator). 

The each is programmable to perform one step during calculation. 
This processor can be classified as a complicated uniprocessor provided with some 
processing arithmetic children of the place which operates as a pipeline when performing 
many complicated processes. Neither grouping nor autonomy is described at all by this 
United States patent. This United States patent only has intention of performing a certain 
improvement so that it may be adapted for a wide range operator. 
[0029]US,4905143,B has described the array processor for calculating the recursive 
function which has local data subordinacy using these calculation results while 
performing calculation of all the combination of the variable of two molds. These 
calculation results are characterized by matching calculation on the basis of the theory of 
dynamic time warping or dynamic programming of the place which is used in the case of 
pattern matching in the field of speech recognition. It has intention of this processor so 
that it may function as one sort of systolic (systolic)MIMD tables. Two or more PE is 
arranged at ring shape, and passes an interim result to the next PE in a ring. Each PE has 
the instruction memory and other means of itself This United States patent has not 
described carrying out grouping of two or more PE at all except for arranging physically 
not having not only described the autonomy of PE in a SIMD array but two or more PE to 
ring shape, either. 

[0030]US,4910665,B explains the two-dimensional interconnection network of a SIMD 
array processor so that each PE can access directly eight elements which adjoin this. The 
communication media are the dot connected networks of the place which carries out 
interconnection of the four adjoining elements by each comer. Although the SIMD 
computer is indicated also with this United States patent, the point whether the local 



autonomy or grouping of PE like this invention can be provided or it should provide is 
not described at all. 

[003 l]It is completely unrelated to realization of the mechanism as this invention in 
which US,49253 1 1,B is the same. In this United States patent, two or more processors in 
a multiple processor system can be assigned as a group that one problem should be 
solved. Each processor can add itself to a group, or it not only can deliver and receive a 
message, and a semaphore and other control with other processors, but can remove itself 
from a group. This United States patent has described nothing on the character as that 
MIMD about giving local autonomy in PE in a SIMD array like this invention. Instead, 
each processor in this multiple processor system has a network interface controller of the 
place containing RAM, a microprocessor, and a certain functional unit (disc controller). 
Grouping is controlled by the network interface of each processor. This invention does 
not need such precise task division. 

[0032]US,4943912,B has described the MIMD array processor connected to the NEWS 
network. That is, after loading a program to the memory of each PE in an array processor, 
an array controller publishes the procedure starting command of 1, in order to identify 
two or more PE which should start execution. Each PE contains the comparison means 
for comparing the register holding two or more task patterns, and the task pattern of the 
PE concerned with a global-area task pattern and instructions. The result of this 
comparison is used in order to be used in order to choose the program starting point in the 
PE concerned, or to make the PE concerned into an idle state. However, this United 
States patent has not suggested this about the autonomy in PE in a SIMD array like this 
invention. The above-mentioned comparison means and its comparison result can be used 
in order to classify two or more PE for various parallel tasks or to carry out grouping. On 
the other hand, this invention carries out elaboration of saying [ a classification or 
carrying out grouping, and how two or more pickets in the SIMD environment are 
permitted in these very thing ]. this effect is in separating some two or more pickets and 
making this into an activity target, while the section of 1 of the SIMD code is performed. 
In this way, the computer of this invention can run so that other groups may be started for 
processing. 

[0033]US,4967340,B has described the systolic array processor. Each processing element 
in this array processor comprises two registers, an adding machine, a multiplier, and three 
programmable switches. Since this array processor is constituted, the switch in each of 
that processing element is set up by a controller. Then, data is sent in through these 
processing elements, in order to create the result of a request. This United States patent 
has not suggested the system concerning this invention. 

[0034]US,5005120,B means that two or more processors exist, and is aimed at the array 
processor. However, this United States patent has only described the time compensating 
circuit used when aligning data in the signal array processor of a bit serial. Each 
processing element of this array processor comprises four bit serial registers of the place 
which supplies data to ALU. The time compensating circuit is placed in front of the 1st 
register. This array processor is completely unrelated to a SIMD computer. 
[0035]US,5020059,B reconstructs an array processor so that defective PE may be 
removed, and it starts the generalized interconnection system for the array processors of 
the place which realizes a tree and other topology within a fundamental two-dimensional 
mesh. The description or suggestion concerning array control architecture, and grouping 



or the arbitrary sides of PE is not performed in this United States patent, therefore 
description about the autonomy in PE is not performed to it, either. However, in the 
SIMD computer of this invention, the operation same as usual completely is performed in 
each picket in an array processor. Forbidding one or more pickets' local autonomy 
selectively (disable), and permitting it (enabling) is a standpoint which should also be 
called foundation of this invention. Although there were all the efforts of the above 
conventional technologies, performing SIMD, MIMD, and a SES/HMD process is not 
realized by one computer until now except for the patent application to which the above- 
shown relates. The mechanism needed for a SIMIMD process, and a floating point 
arithmetic and others is not actually developed fully in the technical field concerned. 
[0036]US,5045995,B was based on the data condition in each PE, and has described the 
mechanism for permitting or forbidding the function of each PE of a SIMD array. If the 
global-area command of 1 is published by all the PE, each PE will pass the bit of a status 
register, and will permit or forbid the PE concerned itself while it carries out the sample 
of the conditions of 1 in the inside. The state of such PE is made to exchange other 
global-area commands effectively. This function can be used in order to realize the 
structure of IF/THEN/ELSE and WHILE/DO. In order to support nested terms of the 
license, the stack of this state can be carried out. Thus, this United States patent is related 
to permission/prohibition function of this invention. However, this United States patent 
needs the following three conditions. 

1. Initial test, loading of status bit, permission/prohibition based on status bit. 

2. Command for flipping all the permission/prohibition bits so that set of everything but 
PE may be permitted. 

3. Memory storage for providing nesting. The above-mentioned function of this United 
States patent is complicated superfluously. On the other hand, the mechanism of this 
invention can permit / forbid many functions, without combining said all three conditions 
of this United States patent. Although this invention does not need the flip function 
described by this United States patent, it can provide wide range capability to a system 
rather than this United States patent provides. The SIMD computer of this invention does 
not need combination with the function of IF and an ELSE instruction, and other features 
concerning video processing at all. 

[0037]generally, according to target technology [ in the technical field concerned / the 
request and this invention ], the parallel array processor which comprises 100-1000 
pickets (PE) is described by the patent application to which the above-shown relates. 
There are many Reasons in carrying out grouping of two or more pickets by a certain 
method so that it can process by one group or two or more groups as whom the picket 
was chosen. For example, when the array processor includes two or more various jobs or 
contains two or more very various portions also among the same jobs, the time of not 
only becoming very troublesome [ this selection process ] but many will be wasted. For 
example, a part of geometric sine/cosine portion in question may need to process SIN(x), 
and other portions of another side and this problem may need to process COS(x). 
Therefore, while having calculated COS by changing into an inactive state the group who 
calculates COS while having calculated SIN, and ranking second, changing into an 
inactive state the group who calculates SIN is performed. 

[0038]However, recognition of that the value in SIN or a COS group is a result of the 
angle of just calculated one will produce difficulty. In this way, it is necessary to make 



very dynamic assignment in SIN or a COS group. So that two or more pickets may 
belong to two or more groups and these pickets may belong to a momentarily different 
group, About these pickets, it will become much more efficient to re-assign these very 
thing in a dynamic mode, since each state may be changed dynamically. 
[0039]in order to know in real time which picket each group's re-assignment is made to 
perform, or belongs to which group, the knowledge of this invention concerning this 
point obtains that an array controller should not be used, and is a thing. 



EFFECT OF THE INVENTION 



[Effect of the InventionJAs mentioned above, according to this invention, the mechanism 
for performing effectively two or more pickets' (PE)'s dynamic and autonomous grouping 
can be provided 



TECHNICAL PROBLEM 



[Problem to be solved by the invention]Therefore, the purpose of this invention is to 
provide the mechanism for performing effectively two or more pickets' (PE)'s dynamic 
and autonomous grouping. 



MEANS 

[Means for solving problemjif this invention relates to providing the mechanism for 
carrying out grouping of two or more pickets in a SIMD/MIMD array and also it explains 
in detail, It starts carrying out grouping of two or more pickets (PE: processing element) 
in a SIMD computer during SIMD of 1, or execution of a SEVUMD program. Although 
many sides of the desirable embodiment are described by the patent application to which 
the above-shown relates. The improvement point of this invention as an array function 
performed in parallel by all the active pickets in an array processor. In order to choose the 
structure for assigning two or more groups two or more pickets, and some pickets who 
should perform calculation peculiar to the group problem of 1, the mechanism for using 
grouping is included. 

[0042]These improvement points are attained by providing an array processor with the 
mechanism for carrying out grouping of two or more pickets in a SIMD/MIMD array. 
This array processor has an array controller of 1, and two or more pickets who can 
function by SIMD operational mode. Each picket has ALU of 1, two or more registers, 
and a local memory of 1, and interconnection is carried out to other pickets during 
operation of this array processor. Each picket can assign one or more groups the picket 
concerned itself dynamically, in order to process data individually in two or more pickets 
who belong to the group of 1. For every clock cycle, all the pickets in this array processor 
receive instructions of 1 from an array controller, and perform these instructions. In this 
case, it can be interpreted as a certain instructions generating different operation within 



each picket, and each picket can operate by SIMIMD operational mode during such 
operation. The original function in the picket of 1 forbids that it should permit that the 
picket concerned participates in processing of the SIMD instruction stream of 1 as a 
result of the SIMD instructions from an array controller, or the picket concerned should 
participate in processing of the SIMD instruction stream concerned. 
[0043]This array processor has a mechanism for local autonomy, the douse (dormancy) 
mode for saving a picket's internal state, and a douse latch. This douse latch can change 
by the calculation result in the picket of 1. The picket calculation which can change this 
douse latch includes the LOAD/SET/RESET command for every data read in a picket's 
local memory for every data calculated. 

[0044]According to this invention, a picket's condition is saved by not permitting storing 
in the local memory in the picket concerned. Other calculations including reading and 
arithmetic calculation of the local memory in a picket can continue this, as only an 
effective result is moved to said douse latch. 

[0045]As a result of this grouping, the group of two or more pickets in an array processor 
is divided by the mold of the problem which these groups include. Each picket has a 
means for assigning the picket itself [ concerned ] to one or more groups who are 
working about the problem of 1 . 

[0046]By creating the new chip and system which were designed according to the new 
concept of this invention, a new technique for building a super parallel computer and 
other computers is provided. This invention is turned to such a system. 

[0047]The picket processor and the advanced parallel array processor (APAP) are 
described by this Description. It has an interesting picket processor that it is available in 
PME (processor memory element) of 1. It is the application for the military affairs of the 
place where it asks for a very small array processor that especially a picket processor is 
useful. In this relation, the picket processor differs from the embodiment of this invention 
relevant to an advanced parallel array processor (APAP) a little. However, since 
similarity exists among both processors, many sides and many features which are 
provided according to this invention can be used in both these processors. 
[0048JA picket is equivalent to 1/n of the elements of the array processor of 1 of the 
place formed from processor elements, a memory element, and an interconnection 
element. This picket concept is applicable to 1/n of APAP. 

[0049]If a picket processor is compared with APAP, although both processors may be 
different in respect of the width of data, the size of a memory, and the number of 
registers. It differs to comprising real original form voice of the super parallel computer 
which is substitution of APAP so that the former may have the connectivity 
(connectivity) to 1/n of a regular array in that it is said that PME in the latter APAP is a 
part of sub array. Both systems can perform SIMIMD. However, since the picket 
processor is constituted as a SIMD computer which has MIMD type PE, APAP 
constituted as an MIMD computer can perform SIMIMD by using MIMD type PE 
controlled to emulate SIMD to the ability to perform SIMIMD directly. Both computers 
use PME. 

[0050]Both systems can be constituted as a parallel array processor of 1 of a place which 
comprises an interconnection network for carrying out interconnection of n PE and such 
PE. In this case, 1/n of the array processor concerned comprises PE and a related memory 
of 1, a control bus interface of 1, and said a part of interconnection network. 



[0051]Since it has double operational mode, it can be ordered this parallel array processor 
to that handling unit so that it may operate in which mode between two modes for SIMD 
operation and MIMD operation and may shift freely between these two modes. That is, to 
each PE, when a mode for SIMD operation is chosen, it can be ordered a handling unit so 
that a command of itself may be executed in SIMIMD mode. On the other hand, when a 
mode for MIMD operation is chosen, the handling unit can synchronize that selected PE 
simulates execution of MIMD. This is called MIMD-SIMD. 

[0052]An interconnection network with which a parallel array processor in both systems 
is provided has a course for delivering information between PE. There are two kinds of 
methods about movement of information. In the 1st method, in order that data under 
movement may not define the address, an array controller orders so that all the messages 
may move in Ihe same direction simultaneously. Self-routing of each message is 
performed according to an address which a header in a start part of each message defines 
by the 2nd method. 

[0053JA segment of this parallel array processor has two or more copies of a handling 
unit provided on a single semiconductor chip. In order that each segment may extend said 
interconnection network with a part of interconnection network relevant to the segment 
concerned, a buffer, and a multiplexer, A control section for making it possible to 
connect the segment concerned with other segments without a joint (seamless) is 
included. 

[0054]A control bus from a controller is formed for every handling unit. This control bus 
has extended in each PE, and controls that operation. 

[0055]Each processing element segment of a parallel array processor, In order to support 
communication of control to an array segment which has two or more copies of a 
processing memory element included within the limits of a single semiconductor chip, 
and is contained in the chip concerned, a part and a register buffer of an array control bus 
are included. 

[0056]Both can perform mesh movement or routed movement. Usually, APAP realizes 
double interconnection structure. That is, on the other hand, two or more semiconductor 
chips are mutually related on the other hand mutually relating with eight PE (or PME) on 
a semiconductor chip. Although programmable routing on a chip makes a link between 
PE (or PME) establish as mentioned above generally, two or more nodes are connected 
by other methods. The usual topology on a chip is a mesh of 2x4, and interconnection of 
a node in this case can be made finishing [ routing ]. Since both systems have an 
interconnection network between PE (or PME), they make it possible for a matrix of 1 to 
comprise two or more point-to-point networks. 



EXAMPLE 



[Working exampleJThe improvement point of this invention for carrying out grouping of 
two or more pickets in the SIMD computer of one during SIMD of 1 or execution of a 
SIMID program is shown in drawing 1 . If it explains much more concretely, the 
processing element (replicatable) in [ of the place generally called PE ] an array processor 
which can be reproduced is shown in drawing 1 . The element in which this duplicate is 
possible contains the means for carrying out interconnection of the processor elements 
(register of ALU+ plurality) of 1, the local memory of 1, and the processing element in 
which the duplicate concerned is possible to the processing element in an array processor 
in which other duplicates are possible. The processing element in which these duplicates 
are possible is controlled by the array controller of 1 in SIMD mode. This array controller 
is for controlling the function of the processing element in which 1 set of duplicates of 
the place which constitutes an array processor are possible. Each of the processing 
element which can be reproduced is a node of 1 of an array processor. In some systems, 
each node comprises the group of two or more chips. In the desirable embodiment of this 
invention, each node can be considered as what comprises 1 set of pickets as described by 
the patent application to which the above-shown relates. Each picket is one of 1 set of 
processing elements carried on the single chip. Although 1 set of these processing 
elements can receive common control in SIMD mode, they can function also in MIMD 
mode. In the system which is not so advanced, 1 set of these processing elements can be 
used as individual processor elements (ALU, a register, a memory, 1/0). In that case, it 
can set, and these individual processor elements can be carried on the single chip 
provided with the external communication means, or it can reproduce as a respectively 
independent chip element. It is a case as 1 set of processing elements are carried on the 
single chip that especially the thing using the concept of the grouping of this invention is 
advantageous. This invention can be used in a super parallel array processor. 
[005 8] Although patent application to which the above-shown relates has described many 
sides of a desirable embodiment, An improvement point of this invention as an array 
function performed in parallel by all the active pickets in an array processor, In order to 
choose structure for assigning two or more groups two or more pickets, and some pickets 
who should perform calculation peculiar to a group problem of 1, a mechanism for using 
grouping is included. A concept for controlling two or more pickets' [ picket / each ] 
grouping is shown in drawing ] . In a memory location of 1 assigned to each group, in 
order to direct intervention to groups involved, each picket sets or resets a bit of 1. As for 
this memory location, it is preferred that it is a part of local memory of a place directly 
connected with an individual picket. Although it is preferred to form some pickets which 
can be reproduced as for this memory location, in a system which assigns a part of shared 
memory as a global-area memory, and assigns other parts as a local memory, the latter 
local memory portion is equivalent to this memory location. A group of 1 comprises all 
the pickets who have specific similarity at first. Although this invention uses two or more 
pickets as a grouping plug by a certain method, there are many Reasons. One example is 
a case as an array processor includes two or more various jobs or two or more very 
various portions are included also among the same jobs. 

[0059]Two or more pickets can assign one or more groups in some groups these very 
thing, and can advance processing based on such grouping. As there are many pickets 



who are calculating at a certain time, it is more desirable, but about some operations, it is 
necessary to work using a picket's subset group. Local autonomy is a tool for doing this 
work, and already explained this. 

[0060]Next, many pickets explain the interesting method of making these very thing 
belong in many groups, without an array controller needing to get to know which picket 
belongs to which group. 

[0061]Though the SIMD computer has advanced local autonomy, it needs to work about 
1 set of problems of the same kind at a certain time. While the active (intervention) picket 
in an array processor is working about the problem of 1, the inactive picket (un- 
participating) of the array processor concerned is placed by the douse state. Each picket 
placed by Dawes Mohd has the activity restricted so that the problem and data in the 
picket concerned may not be blocked. Although the picket placed by Dawes Mohd can 
still read the local memory and can perform almost all operations that other pickets 
moreover perform, he cannot perform storing in the local memory or register. However, a 
douse bit can be stored now in a status word as the exception. In this way, a douse bit can 
be loaded, in order for the picket of 1 to reboot the picket concerned while calculating a 
new state (recovery). 

[0062]There is the important feature of grouping of having followed this invention in the 
picket concerned making a decision about whether each picket belongs to the group or 
two or more groups of 1. Therefore, this assignment and a re-allocation process are the 
parallel operation of 1, and many pickets can perform the same operation as this in 
parallel. An array controller does not have the required skill which gets to know which 
picket belongs to which group. However, when you need such information, the array 
controller can read the state from each picket. In this way, the function of this array 
controller is arranged in the register relevant to each picket of the place "execution/douse 
state [ place ]" is written in drawin g 1 . This register is provided in the inside of the picket 
concerned, and can read those contents with an array controller. 
[0063]In order to establish two or more separate groups, the memory location of 1 is 
reserved for every group inside each picket. The memory location "xxl" in the inside of 
all the pickets holds the douse control bit corresponding to the group 1 as shown in 
drawing 1 . A calculated value which is loaded to this douse bit is first stored in the 
memory location corresponding to a suitable group in each picket. Subsequently, this 
value is put in into a douse bit so that the condition of the picket concerned may be 
changed. According to this invention, in order to identify and set up some groups in 
advance of individual processing, a suitable binary pattern (1/0) is loaded to the memory 
location corresponding to these groups. What is necessary is just to load the value to the 
group of 1, while investigating the carry output and/or zero conditions of operation of 
one, in order to identify the group of 1 . In order to set up many groups, there are the 
following methods. 

[0064]- Carry out grouping of all the pickets of a place which has a carry output of 1 . - 
The contents of the selected memory location carry out grouping of all the pickets of a 
place which is positive. - The contents of selected memory location ** carry out grouping 
of all the pickets of a place equal to a specific simultaneous transmissive communication 
value. 

[0065]If a picket of 1 belongs to the group 2, a memory location "xx2" in the picket 
concerned will hold the logic 1, otherwise will hold the logic 0. In order to change a 



douse bit, three instmctions (LOAD/SET/RESET) can be used. The 1st instmctions 
"LOADDOZE" make an active group change into a group who comprises all the pickets 
who hold the logic 1 in each memory location (for example, xx3). Although it was an ON 
state before, a picket who holds the logic 0 to the memory location "xx3" starts to turn 
off. The 2nd instructions "SET DOZE" add all the pickets who hold the logic 1 in each 
memory location (for example, xx4) to an active group. In this case, any pickets do not 
start to turn off The 3rd instructions "RESET DOZE" remove all the pickets who hold 
the logic 1 in each memory location (for example, xx4) from an active group. 
[0066]If these instructions, i.e., LOAD, SET (AND), and RESET (OR) are used, based 
on the existing group's logical relation, a logic function for building a new group can be 
used effectively. The new group of 1 can be built using aggregate theory by merging two 
or more existing groups logically within this picket data flow. 
[0067]An array controller knows two or more groups' existence until it results in this 
point. It is because this array controller is involved in processing with these groups and 
active processing has been made to change among groups, or [ however, / which picket 
belongs to which group, or that an actually specific group of this array controller is empty 
] (there is no picket) - or [ no ] ~ ******** it does not know at all. 
[0068]This array controller can read information on an individual picket by using other 
functions if needed. There is a result of these functions in giving the target picket's 
address. Below, three examples are shown. 

- Find the picket who has the newest value within a predetermined group. 

- Find the picket who has the minimum address within a predetermined group. 

- Find the picket who has the value which is approaching most mutually within a 
predetermined group. 

[0069]In each of these cases, the picket who has a desired value must be separated by [ an 
array controller ] executing a series of commands. In order to separate a desired picket, 
the various commands between pickets can be used. He is the only picket from whom the 
desired picket has still awoke at this time. The array controller can require the address, 
and can read data in the local memory, or can load data to the local memory. 
[0070]Also in other modes, such as SIMIMD, the grouping function in an array processor 
serves as a powerful tool only in SIMD. 



DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1 l it is a figure showing the control concept for carrying out grouping of two or 
more pickets from each picket. In the memory location of 1 assigned to each group, in 
order to direct intervention into groups involved, each picket sets or resets the bit of 1. As 
for this memory location, it is preferred that it is a part of local memory of the place 
directly connected with an individual picket. 
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